Skip to content

provider/akamai: return early when there are no relevant changes#6189

Open
endocrimes wants to merge 2 commits intokubernetes-sigs:masterfrom
endocrimes:dani/akadns
Open

provider/akamai: return early when there are no relevant changes#6189
endocrimes wants to merge 2 commits intokubernetes-sigs:masterfrom
endocrimes:dani/akadns

Conversation

@endocrimes
Copy link
Member

What does it do ?

Currently, if a cluster has multiple instances of external-dns, with
different domain filters (e.g across multiple providers), the Akamai
provider will repeatedly re-create all of its zones records, even when
there are no changes relevant to its filtered set.

While this isn't inherently a problem for a single cluster controlling
an entire zone, if you have many clusters updating records in a single
zone, the chances of these no-op operations conflicting, failing, and
resulting in the exit of external-dns is quite high. This can prevent
real changes from being applied.

This PR adds defensive filtering and exiting when no relevant
changes are detected, based pretty heavily on the implementations of
other, more active providers.

Motivation

Production issues with external-dns regularly crashing when it conflicts with other instances for no-op change applications.

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly (There isn't really documentation for this behavior)

Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
Currently, if a cluster has multiple instances of external-dns, with
different domain filters (e.g across multiple providers), the Akamai
provider will repeatedly re-create all of its zones records, even when
there are no changes relevant to its filtered set.

While this isn't inherently a problem for a single cluster controlling
an entire zone, if you have many clusters updating records in a single
zone, the chances of these no-op operations conflicting, failing, and
resulting in the exit of external-dns is quite high. This can prevent
real changes from being applied.

This commit adds defensive filtering and exiting when no relevant
changes are detected, based pretty heavily on the implementations of
other, more active providers.

Signed-off-by: Danielle Lancashire <dani@builds.terrible.systems>
@k8s-ci-robot k8s-ci-robot added provider Issues or PRs related to a provider size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 11, 2026
@coveralls
Copy link

Pull Request Test Coverage Report for Build 21906596659

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 37 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 79.176%

Files with Coverage Reduction New Missed Lines %
akamai/akamai.go 37 73.19%
Totals Coverage Status
Change from base Build 21899020757: 0.02%
Covered Lines: 16060
Relevant Lines: 20284

💛 - Coveralls

Copy link
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivankatliarchuk
Copy link
Member

Could you have a look how it is done on AWS for example?

We have a common interface - zonesToTagFilter

for _, zone := range resp.HostedZones {

Another thing, if you have any interest, are you willing to implement Zone Cache first in another PR https://github.com/kubernetes-sigs/external-dns/blob/master/provider/blueprint/zone_cache.go? Implementing cache will reduce a drift in this provider.

For most of changes, we expect evidences similar to #5085 (comment)

As we have no other way to validate that the feature is working.

@endocrimes
Copy link
Member Author

endocrimes commented Feb 11, 2026

@ivankatliarchuk It was actually pretty hard to find a pattern for this - because a bunch of providers end up implementing this really subtly (by going from their configured zones -> records to update, rather than working from the changeset -> things to update).

Just filtering the result of zones (a la the filter you've linked) doesn't actually solve the problem here (they're already filtered in https://github.com/endocrimes/external-dns/blob/9fbc1bcf96dea7951d57d6de89f33d21e1de4323/provider/akamai/akamai.go#L181-L208) - because the create/update/delete functions will re-create all records regardless of the change being applied, just logging the skipped non-applicable things that triggered them in the first place:

time="2026-02-11T10:34:05Z" level=debug msg="Fetched zone: 'prod.example.com' (ZoneID: REDACTED)"
time="2026-02-11T10:34:05Z" level=debug msg="Fetched zone: 'cluster.example.com' (ZoneID: REDACTED)"
time="2026-02-11T10:34:05Z" level=debug msg="Fetched '2' zones from Akamai"
time="2026-02-11T10:34:05Z" level=debug msg="Processing zones: [map[prod.example.com:prod.example.com cluster.example.com:cluster.example.com]]"
time="2026-02-11T10:34:05Z" level=debug msg="Create Changes requested [[*.ap-west.infra.notexample.com 0 IN A  REDACTED [] a-*.ap-west.infra.notexample.com 0 IN TXT  \"heritage=external-dns,external-dns/owner=...\" []]]"
time="2026-02-11T10:34:05Z" level=debug msg="Skipping Akamai Edge DNS creation of endpoint: '*.ap-west.infra.notexample.com' type: 'A', it does not match against Domain filters"
time="2026-02-11T10:34:05Z" level=debug msg="Skipping Akamai Edge DNS creation of endpoint: 'a-*.ap-west.infra.notexample.com' type: 'TXT', it does not match against Domain filters"
time="2026-02-11T10:34:05Z" level=error msg="Failed to create endpoints for DNS zone cluster.example.com. Error: Modification Confict: [Concurrent Zone Modification Error for cluster.example.com. Please try your request again.]"
time="2026-02-11T10:34:05Z" level=fatal msg="Failed to do run once: Modification Confict: [Concurrent Zone Modification Error for cluster.example.com. Please try your request again.]"

The AWS Provider's equivalent of the filter suggested here is part of the mapping that occurs here:

log.Debugf("Skipping record %s because no hosted zone matching record DNS Name was detected", *c.ResourceRecordSet.Name)
- The Akamai provider just doesn't switch to that style of representation today, and operates directly on the changes arrays provided by external-dns (it might be worth fixing, but is a much scarier and hard to test change than filtering).

Copy link
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed PR change (early return on empty changes) only avoids fetchZones() when the changeset is literally empty. But every regular reconciliation cycle calls both
Records() (line 213) and ApplyChanges() - that's two fetchZones() API calls per loop iteration, and fetchZones hits p.client.ListZones() every time with no caching.

Proposed in separate PR to utilize blueprint.ZoneCache for provider.

The zone cache would eliminate redundant ListZones calls on every cycle (both Records and ApplyChanges would share the cache), while the early return only helps the edge case of empty changesets. They're complementary, but if I had to pick one, the cache gives far more value.

That said, both are small, independent fixes/proposal - the cache is a bigger win, and the edgeChangesByZone cleanup is still a correct bugfix regardless.

@@ -484,3 +501,15 @@ func edgeChangesByZone(zoneMap provider.ZoneIDName, endpoints []*endpoint.Endpoi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// remove empty zones to avoid no-op API calls
for zone, eps := range createsByZone {
if len(eps) == 0 {
delete(createsByZone, zone)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That eliminates the empty API calls at the source. Then filterEndpointsByZone and the second early return become unnecessary and can be removed. Still needs testing, could be that the unit tests are not right or smth.

@@ -256,6 +256,12 @@ func (p AkamaiProvider) Records(context.Context) ([]*endpoint.Endpoint, error) {

// ApplyChanges applies a given set of changes in a given zone.
func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error {
func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error {
// return early if there is nothing to change
if len(changes.Create) == 0 && len(changes.Delete) == 0 && len(changes.UpdateNew) == 0 {
log.Info("All records are already up to date")
return nil
}
zoneNameIDMapper := provider.ZoneIDName{}
zones, err := p.fetchZones()
if err != nil {
log.Errorf("Failed to fetch zones from Akamai")
return err
}
for _, z := range zones.Zones {
zoneNameIDMapper[z.Zone] = z.Zone
}
log.Debugf("Processing zones: [%v]", zoneNameIDMapper)
// Create recordsets
log.Debugf("Create Changes requested [%v]", changes.Create)
if err := p.createRecordsets(zoneNameIDMapper, changes.Create); err != nil {
return err
}
// Delete recordsets
log.Debugf("Delete Changes requested [%v]", changes.Delete)
if err := p.deleteRecordsets(zoneNameIDMapper, changes.Delete); err != nil {
return err
}
// Update recordsets
log.Debugf("Update Changes requested [%v]", changes.UpdateNew)
if err := p.updateNewRecordsets(zoneNameIDMapper, changes.UpdateNew); err != nil {
return err
}
// Check that all old endpoints were accounted for
revRecs := changes.Delete
revRecs = append(revRecs, changes.UpdateNew...)
for _, rec := range changes.UpdateOld {
found := false
for _, r := range revRecs {
if rec.DNSName == r.DNSName {
found = true
break
}
}
if !found {
log.Warnf("UpdateOld endpoint '%s' is not accounted for in UpdateNew|Delete endpoint list", rec.DNSName)
}
}
return nil
}

// ApplyChanges applies a given set of changes in a given zone.
func (p AkamaiProvider) ApplyChanges(_ context.Context, changes *plan.Changes) error {
// return early if there is nothing to change
if len(changes.Create) == 0 && len(changes.Delete) == 0 && len(changes.UpdateNew) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(changes.Create) == 0 && len(changes.Delete) == 0 && len(changes.UpdateNew) == 0 {
if !changes.HasChanges() {

and most likely log.Debug, not Info

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. provider Issues or PRs related to a provider size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants